Overview

Dataset statistics

Number of variables23
Number of observations57293
Missing cells21802
Missing cells (%)1.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory10.1 MiB
Average record size in memory184.0 B

Variable types

DateTime1
Categorical4
Numeric16
Boolean2

Warnings

MinTemp is highly correlated with MaxTemp and 3 other fieldsHigh correlation
MaxTemp is highly correlated with MinTemp and 3 other fieldsHigh correlation
Evaporation is highly correlated with MaxTemp and 3 other fieldsHigh correlation
Sunshine is highly correlated with Humidity3pm and 2 other fieldsHigh correlation
WindGustSpeed is highly correlated with WindSpeed9am and 1 other fieldsHigh correlation
WindSpeed9am is highly correlated with WindGustSpeedHigh correlation
WindSpeed3pm is highly correlated with WindGustSpeedHigh correlation
Humidity9am is highly correlated with Evaporation and 1 other fieldsHigh correlation
Humidity3pm is highly correlated with Sunshine and 4 other fieldsHigh correlation
Pressure9am is highly correlated with Pressure3pmHigh correlation
Pressure3pm is highly correlated with MinTemp and 2 other fieldsHigh correlation
Cloud9am is highly correlated with Sunshine and 2 other fieldsHigh correlation
Cloud3pm is highly correlated with Sunshine and 2 other fieldsHigh correlation
Temp9am is highly correlated with MinTemp and 4 other fieldsHigh correlation
Temp3pm is highly correlated with MinTemp and 4 other fieldsHigh correlation
MinTemp is highly correlated with MaxTemp and 5 other fieldsHigh correlation
MaxTemp is highly correlated with MinTemp and 3 other fieldsHigh correlation
Evaporation is highly correlated with MinTemp and 4 other fieldsHigh correlation
Sunshine is highly correlated with Humidity9am and 4 other fieldsHigh correlation
WindGustSpeed is highly correlated with WindSpeed9am and 1 other fieldsHigh correlation
WindSpeed9am is highly correlated with WindGustSpeedHigh correlation
WindSpeed3pm is highly correlated with WindGustSpeedHigh correlation
Humidity9am is highly correlated with Evaporation and 2 other fieldsHigh correlation
Humidity3pm is highly correlated with Sunshine and 3 other fieldsHigh correlation
Pressure9am is highly correlated with MinTemp and 1 other fieldsHigh correlation
Pressure3pm is highly correlated with MinTemp and 2 other fieldsHigh correlation
Cloud9am is highly correlated with Sunshine and 2 other fieldsHigh correlation
Cloud3pm is highly correlated with Sunshine and 2 other fieldsHigh correlation
Temp9am is highly correlated with MinTemp and 4 other fieldsHigh correlation
Temp3pm is highly correlated with MinTemp and 4 other fieldsHigh correlation
MinTemp is highly correlated with MaxTemp and 2 other fieldsHigh correlation
MaxTemp is highly correlated with MinTemp and 3 other fieldsHigh correlation
Evaporation is highly correlated with MaxTempHigh correlation
Sunshine is highly correlated with Cloud9am and 1 other fieldsHigh correlation
WindGustSpeed is highly correlated with WindSpeed3pmHigh correlation
WindSpeed3pm is highly correlated with WindGustSpeedHigh correlation
Pressure9am is highly correlated with Pressure3pmHigh correlation
Pressure3pm is highly correlated with Pressure9amHigh correlation
Cloud9am is highly correlated with Sunshine and 1 other fieldsHigh correlation
Cloud3pm is highly correlated with Sunshine and 1 other fieldsHigh correlation
Temp9am is highly correlated with MinTemp and 2 other fieldsHigh correlation
Temp3pm is highly correlated with MinTemp and 2 other fieldsHigh correlation
Pressure9am is highly correlated with Pressure3pm and 2 other fieldsHigh correlation
WindGustDir is highly correlated with WindDir3pm and 2 other fieldsHigh correlation
Sunshine is highly correlated with RainTomorrow and 6 other fieldsHigh correlation
RainTomorrow is highly correlated with Sunshine and 2 other fieldsHigh correlation
Cloud9am is highly correlated with Sunshine and 3 other fieldsHigh correlation
RainToday is highly correlated with Humidity3pm and 1 other fieldsHigh correlation
WindDir3pm is highly correlated with WindGustDir and 2 other fieldsHigh correlation
WindSpeed9am is highly correlated with WindSpeed3pm and 1 other fieldsHigh correlation
Humidity3pm is highly correlated with Sunshine and 8 other fieldsHigh correlation
Pressure3pm is highly correlated with Pressure9am and 4 other fieldsHigh correlation
MaxTemp is highly correlated with Sunshine and 7 other fieldsHigh correlation
Temp3pm is highly correlated with Sunshine and 7 other fieldsHigh correlation
WindSpeed3pm is highly correlated with WindSpeed9am and 1 other fieldsHigh correlation
WindGustSpeed is highly correlated with WindSpeed9am and 1 other fieldsHigh correlation
Humidity9am is highly correlated with Sunshine and 7 other fieldsHigh correlation
Temp9am is highly correlated with Pressure9am and 6 other fieldsHigh correlation
WindDir9am is highly correlated with WindGustDir and 2 other fieldsHigh correlation
MinTemp is highly correlated with Pressure9am and 5 other fieldsHigh correlation
Cloud3pm is highly correlated with Sunshine and 3 other fieldsHigh correlation
Location is highly correlated with WindGustDir and 8 other fieldsHigh correlation
Sunshine has 6177 (10.8%) missing values Missing
WindGustDir has 3662 (6.4%) missing values Missing
WindGustSpeed has 3644 (6.4%) missing values Missing
WindDir9am has 1730 (3.0%) missing values Missing
WindDir3pm has 671 (1.2%) missing values Missing
Humidity3pm has 1088 (1.9%) missing values Missing
Cloud3pm has 2631 (4.6%) missing values Missing
Temp3pm has 926 (1.6%) missing values Missing
Rainfall has 37433 (65.3%) zeros Zeros
Sunshine has 1574 (2.7%) zeros Zeros
WindSpeed9am has 1606 (2.8%) zeros Zeros
Cloud9am has 5963 (10.4%) zeros Zeros
Cloud3pm has 3573 (6.2%) zeros Zeros

Reproduction

Analysis started2021-05-15 16:44:04.111249
Analysis finished2021-05-15 16:45:06.816107
Duration1 minute and 2.7 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

Date
Date

Distinct3411
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Memory size447.7 KiB
Minimum2007-11-01 00:00:00
Maximum2017-06-25 00:00:00
Histogram with fixed size bins (bins=50)

Location
Categorical

HIGH CORRELATION

Distinct30
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size447.7 KiB
Darwin
 
2525
Perth
 
2506
Brisbane
 
2467
PerthAirport
 
2422
MelbourneAirport
 
2409
Other values (25)
44964 

Length

Max length16
Median length8
Mean length8.718098197
Min length4

Characters and Unicode

Total characters499486
Distinct characters37
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTownsville
2nd rowWoomera
3rd rowMildura
4th rowPortland
5th rowBrisbane

Common Values

ValueCountFrequency (%)
Darwin2525
 
4.4%
Perth2506
 
4.4%
Brisbane2467
 
4.3%
PerthAirport2422
 
4.2%
MelbourneAirport2409
 
4.2%
Watsonia2385
 
4.2%
SydneyAirport2367
 
4.1%
Mildura2300
 
4.0%
Nuriootpa2290
 
4.0%
Sydney2214
 
3.9%
Other values (20)33408
58.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
darwin2525
 
4.4%
perth2506
 
4.4%
brisbane2467
 
4.3%
perthairport2422
 
4.2%
melbourneairport2409
 
4.2%
watsonia2385
 
4.2%
sydneyairport2367
 
4.1%
mildura2300
 
4.0%
nuriootpa2290
 
4.0%
sydney2214
 
3.9%
Other values (20)33408
58.3%

Most occurring characters

ValueCountFrequency (%)
r57019
 
11.4%
a46868
 
9.4%
o43103
 
8.6%
e37761
 
7.6%
n35366
 
7.1%
i34732
 
7.0%
l25315
 
5.1%
t24840
 
5.0%
b17039
 
3.4%
s14841
 
3.0%
Other values (27)162602
32.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter425025
85.1%
Uppercase Letter74461
 
14.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r57019
13.4%
a46868
11.0%
o43103
10.1%
e37761
8.9%
n35366
8.3%
i34732
 
8.2%
l25315
 
6.0%
t24840
 
5.8%
b17039
 
4.0%
s14841
 
3.5%
Other values (12)88141
20.7%
Uppercase Letter
ValueCountFrequency (%)
A11485
15.4%
M10171
13.7%
W10101
13.6%
S8166
11.0%
C6779
9.1%
P6689
9.0%
N4450
 
6.0%
B3250
 
4.4%
H3015
 
4.0%
D2525
 
3.4%
Other values (5)7830
10.5%

Most occurring scripts

ValueCountFrequency (%)
Latin499486
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r57019
 
11.4%
a46868
 
9.4%
o43103
 
8.6%
e37761
 
7.6%
n35366
 
7.1%
i34732
 
7.0%
l25315
 
5.1%
t24840
 
5.0%
b17039
 
3.4%
s14841
 
3.0%
Other values (27)162602
32.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII499486
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r57019
 
11.4%
a46868
 
9.4%
o43103
 
8.6%
e37761
 
7.6%
n35366
 
7.1%
i34732
 
7.0%
l25315
 
5.1%
t24840
 
5.0%
b17039
 
3.4%
s14841
 
3.0%
Other values (27)162602
32.6%

MinTemp
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct360
Distinct (%)0.6%
Missing57
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean13.33193794
Minimum-8
Maximum31.4
Zeros33
Zeros (%)0.1%
Negative662
Negative (%)1.2%
Memory size447.7 KiB

Quantile statistics

Minimum-8
5-th percentile3.1
Q18.5
median13.1
Q318.2
95-th percentile24.2
Maximum31.4
Range39.4
Interquartile range (IQR)9.7

Descriptive statistics

Standard deviation6.451281533
Coefficient of variation (CV)0.4838967569
Kurtosis-0.6615751152
Mean13.33193794
Median Absolute Deviation (MAD)4.8
Skewness0.03914135149
Sum763066.8
Variance41.61903341
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13367
 
0.6%
8.5353
 
0.6%
10.8352
 
0.6%
10.5349
 
0.6%
11344
 
0.6%
9.6343
 
0.6%
12342
 
0.6%
12.5338
 
0.6%
11.1332
 
0.6%
8.9328
 
0.6%
Other values (350)53788
93.9%
ValueCountFrequency (%)
-81
< 0.1%
-6.91
< 0.1%
-6.71
< 0.1%
-6.51
< 0.1%
-6.32
< 0.1%
-6.11
< 0.1%
-5.81
< 0.1%
-5.51
< 0.1%
-5.42
< 0.1%
-5.31
< 0.1%
ValueCountFrequency (%)
31.42
< 0.1%
29.81
 
< 0.1%
29.74
< 0.1%
29.61
 
< 0.1%
29.51
 
< 0.1%
29.42
< 0.1%
29.31
 
< 0.1%
29.22
< 0.1%
29.14
< 0.1%
294
< 0.1%

MaxTemp
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct392
Distinct (%)0.7%
Missing51
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean24.14519584
Minimum4.1
Maximum48.1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum4.1
5-th percentile13.5
Q118.6
median23.6
Q329.6
95-th percentile36
Maximum48.1
Range44
Interquartile range (IQR)11

Descriptive statistics

Standard deviation7.055457077
Coefficient of variation (CV)0.2922095611
Kurtosis-0.6851616081
Mean24.14519584
Median Absolute Deviation (MAD)5.5
Skewness0.2420339528
Sum1382119.3
Variance49.77947457
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19327
 
0.6%
19.5325
 
0.6%
19.6320
 
0.6%
18.2318
 
0.6%
21.5309
 
0.5%
23.5308
 
0.5%
19.4307
 
0.5%
22305
 
0.5%
20.1301
 
0.5%
19.8300
 
0.5%
Other values (382)54122
94.5%
ValueCountFrequency (%)
4.11
 
< 0.1%
6.31
 
< 0.1%
72
< 0.1%
7.21
 
< 0.1%
7.31
 
< 0.1%
7.52
< 0.1%
7.62
< 0.1%
7.71
 
< 0.1%
7.83
< 0.1%
7.91
 
< 0.1%
ValueCountFrequency (%)
48.11
 
< 0.1%
47.31
 
< 0.1%
46.82
< 0.1%
46.72
< 0.1%
46.42
< 0.1%
46.31
 
< 0.1%
46.21
 
< 0.1%
46.13
< 0.1%
462
< 0.1%
45.81
 
< 0.1%

Rainfall
Real number (ℝ≥0)

ZEROS

Distinct421
Distinct (%)0.7%
Missing58
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean2.109057395
Minimum0
Maximum206.2
Zeros37433
Zeros (%)65.3%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.6
95-th percentile12.2
Maximum206.2
Range206.2
Interquartile range (IQR)0.6

Descriptive statistics

Standard deviation6.893643046
Coefficient of variation (CV)3.26858959
Kurtosis95.83698141
Mean2.109057395
Median Absolute Deviation (MAD)0
Skewness7.449716868
Sum120711.9
Variance47.52231444
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
037433
65.3%
0.22984
 
5.2%
0.41597
 
2.8%
0.61039
 
1.8%
0.8852
 
1.5%
1681
 
1.2%
1.2633
 
1.1%
1.4557
 
1.0%
1.6473
 
0.8%
1.8423
 
0.7%
Other values (411)10563
 
18.4%
ValueCountFrequency (%)
037433
65.3%
0.142
 
0.1%
0.22984
 
5.2%
0.322
 
< 0.1%
0.41597
 
2.8%
0.514
 
< 0.1%
0.61039
 
1.8%
0.75
 
< 0.1%
0.8852
 
1.5%
0.99
 
< 0.1%
ValueCountFrequency (%)
206.21
< 0.1%
1831
< 0.1%
182.61
< 0.1%
168.41
< 0.1%
1451
< 0.1%
140.21
< 0.1%
136.61
< 0.1%
129.41
< 0.1%
128.21
< 0.1%
1281
< 0.1%

Evaporation
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct321
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.558703507
Minimum0
Maximum145
Zeros149
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12.8
median4.8
Q37.4
95-th percentile12
Maximum145
Range145
Interquartile range (IQR)4.6

Descriptive statistics

Standard deviation4.146830161
Coefficient of variation (CV)0.7460067183
Kurtosis53.25295264
Mean5.558703507
Median Absolute Deviation (MAD)2.3
Skewness3.956221832
Sum318474.8
Variance17.19620039
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
42322
 
4.1%
81893
 
3.3%
2.21421
 
2.5%
2.61365
 
2.4%
2.41351
 
2.4%
3.21351
 
2.4%
3.41348
 
2.4%
31347
 
2.4%
21326
 
2.3%
2.81320
 
2.3%
Other values (311)42249
73.7%
ValueCountFrequency (%)
0149
 
0.3%
0.16
 
< 0.1%
0.2332
 
0.6%
0.36
 
< 0.1%
0.4492
0.9%
0.56
 
< 0.1%
0.6681
1.2%
0.713
 
< 0.1%
0.8862
1.5%
0.916
 
< 0.1%
ValueCountFrequency (%)
1451
< 0.1%
81.21
< 0.1%
77.31
< 0.1%
74.81
< 0.1%
72.21
< 0.1%
70.41
< 0.1%
701
< 0.1%
68.81
< 0.1%
65.41
< 0.1%
64.81
< 0.1%

Sunshine
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

Distinct145
Distinct (%)0.3%
Missing6177
Missing (%)10.8%
Infinite0
Infinite (%)0.0%
Mean7.651782221
Minimum0
Maximum14.5
Zeros1574
Zeros (%)2.7%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum0
5-th percentile0.3
Q14.9
median8.5
Q310.7
95-th percentile12.8
Maximum14.5
Range14.5
Interquartile range (IQR)5.8

Descriptive statistics

Standard deviation3.769488309
Coefficient of variation (CV)0.4926288021
Kurtosis-0.8009073849
Mean7.651782221
Median Absolute Deviation (MAD)2.6
Skewness-0.5258108576
Sum391128.5
Variance14.20904211
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01574
 
2.7%
11786
 
1.4%
10.7782
 
1.4%
10.8772
 
1.3%
10.5737
 
1.3%
10.3709
 
1.2%
10.9709
 
1.2%
10.6708
 
1.2%
10703
 
1.2%
10.4692
 
1.2%
Other values (135)42944
75.0%
(Missing)6177
 
10.8%
ValueCountFrequency (%)
01574
2.7%
0.1371
 
0.6%
0.2336
 
0.6%
0.3281
 
0.5%
0.4230
 
0.4%
0.5222
 
0.4%
0.6195
 
0.3%
0.7232
 
0.4%
0.8228
 
0.4%
0.9217
 
0.4%
ValueCountFrequency (%)
14.51
 
< 0.1%
14.33
 
< 0.1%
14.21
 
< 0.1%
14.13
 
< 0.1%
149
 
< 0.1%
13.912
 
< 0.1%
13.837
 
0.1%
13.784
0.1%
13.6112
0.2%
13.5117
0.2%

WindGustDir
Categorical

HIGH CORRELATION
MISSING

Distinct16
Distinct (%)< 0.1%
Missing3662
Missing (%)6.4%
Memory size447.7 KiB
E
4355 
SW
3844 
W
3838 
N
3781 
ENE
3767 
Other values (11)
34046 

Length

Max length3
Median length2
Mean length2.18097742
Min length1

Characters and Unicode

Total characters116968
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowENE
2nd rowENE
3rd rowN
4th rowW
5th rowENE

Common Values

ValueCountFrequency (%)
E4355
 
7.6%
SW3844
 
6.7%
W3838
 
6.7%
N3781
 
6.6%
ENE3767
 
6.6%
SE3711
 
6.5%
SSW3671
 
6.4%
WSW3645
 
6.4%
S3417
 
6.0%
ESE3283
 
5.7%
Other values (6)16319
28.5%
(Missing)3662
 
6.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
e4355
 
8.1%
sw3844
 
7.2%
w3838
 
7.2%
n3781
 
7.1%
ene3767
 
7.0%
se3711
 
6.9%
ssw3671
 
6.8%
wsw3645
 
6.8%
s3417
 
6.4%
ese3283
 
6.1%
Other values (6)16319
30.4%

Most occurring characters

ValueCountFrequency (%)
S31748
27.1%
E30841
26.4%
W29197
25.0%
N25182
21.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter116968
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S31748
27.1%
E30841
26.4%
W29197
25.0%
N25182
21.5%

Most occurring scripts

ValueCountFrequency (%)
Latin116968
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S31748
27.1%
E30841
26.4%
W29197
25.0%
N25182
21.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII116968
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S31748
27.1%
E30841
26.4%
W29197
25.0%
N25182
21.5%

WindGustSpeed
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct64
Distinct (%)0.1%
Missing3644
Missing (%)6.4%
Infinite0
Infinite (%)0.0%
Mean40.38481612
Minimum7
Maximum135
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum7
5-th percentile22
Q131
median39
Q348
95-th percentile65
Maximum135
Range128
Interquartile range (IQR)17

Descriptive statistics

Standard deviation13.22717302
Coefficient of variation (CV)0.3275283705
Kurtosis1.662944421
Mean40.38481612
Median Absolute Deviation (MAD)8
Skewness0.9720947814
Sum2166605
Variance174.958106
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
353862
 
6.7%
393738
 
6.5%
373508
 
6.1%
333457
 
6.0%
313436
 
6.0%
413113
 
5.4%
302798
 
4.9%
432576
 
4.5%
282443
 
4.3%
442156
 
3.8%
Other values (54)22562
39.4%
(Missing)3644
 
6.4%
ValueCountFrequency (%)
71
 
< 0.1%
91
 
< 0.1%
1116
 
< 0.1%
1397
 
0.2%
15220
 
0.4%
17423
 
0.7%
19558
 
1.0%
20884
1.5%
221041
1.8%
241471
2.6%
ValueCountFrequency (%)
1351
 
< 0.1%
1241
 
< 0.1%
1222
< 0.1%
1201
 
< 0.1%
1172
< 0.1%
1152
< 0.1%
1132
< 0.1%
1111
 
< 0.1%
1094
< 0.1%
1073
< 0.1%

WindDir9am
Categorical

HIGH CORRELATION
MISSING

Distinct16
Distinct (%)< 0.1%
Missing1730
Missing (%)3.0%
Memory size447.7 KiB
N
4669 
E
4350 
SE
3963 
W
3757 
SSE
3724 
Other values (11)
35100 

Length

Max length3
Median length2
Mean length2.174864568
Min length1

Characters and Unicode

Total characters120842
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSE
2nd rowSSE
3rd rowESE
4th rowNNW
5th rowSW

Common Values

ValueCountFrequency (%)
N4669
 
8.1%
E4350
 
7.6%
SE3963
 
6.9%
W3757
 
6.6%
SSE3724
 
6.5%
ENE3708
 
6.5%
ESE3428
 
6.0%
S3391
 
5.9%
NE3322
 
5.8%
SW3279
 
5.7%
Other values (6)17972
31.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
n4669
 
8.4%
e4350
 
7.8%
se3963
 
7.1%
w3757
 
6.8%
sse3724
 
6.7%
ene3708
 
6.7%
ese3428
 
6.2%
s3391
 
6.1%
ne3322
 
6.0%
sw3279
 
5.9%
Other values (6)17972
32.3%

Most occurring characters

ValueCountFrequency (%)
E32817
27.2%
S30266
25.0%
N30023
24.8%
W27736
23.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter120842
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E32817
27.2%
S30266
25.0%
N30023
24.8%
W27736
23.0%

Most occurring scripts

ValueCountFrequency (%)
Latin120842
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E32817
27.2%
S30266
25.0%
N30023
24.8%
W27736
23.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII120842
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E32817
27.2%
S30266
25.0%
N30023
24.8%
W27736
23.0%

WindDir3pm
Categorical

HIGH CORRELATION
MISSING

Distinct16
Distinct (%)< 0.1%
Missing671
Missing (%)1.2%
Memory size447.7 KiB
SE
4141 
S
4098 
SW
4078 
E
3985 
ESE
3964 
Other values (11)
36356 

Length

Max length3
Median length2
Mean length2.201706051
Min length1

Characters and Unicode

Total characters124665
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowENE
2nd rowESE
3rd rowENE
4th rowW
5th rowNE

Common Values

ValueCountFrequency (%)
SE4141
 
7.2%
S4098
 
7.2%
SW4078
 
7.1%
E3985
 
7.0%
ESE3964
 
6.9%
WSW3958
 
6.9%
W3873
 
6.8%
ENE3771
 
6.6%
SSW3502
 
6.1%
N3458
 
6.0%
Other values (6)17794
31.1%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
se4141
 
7.3%
s4098
 
7.2%
sw4078
 
7.2%
e3985
 
7.0%
ese3964
 
7.0%
wsw3958
 
7.0%
w3873
 
6.8%
ene3771
 
6.7%
ssw3502
 
6.2%
n3458
 
6.1%
Other values (6)17794
31.4%

Most occurring characters

ValueCountFrequency (%)
S33923
27.2%
E32636
26.2%
W31353
25.1%
N26753
21.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter124665
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S33923
27.2%
E32636
26.2%
W31353
25.1%
N26753
21.5%

Most occurring scripts

ValueCountFrequency (%)
Latin124665
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S33923
27.2%
E32636
26.2%
W31353
25.1%
N26753
21.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII124665
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S33923
27.2%
E32636
26.2%
W31353
25.1%
N26753
21.5%

WindSpeed9am
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct37
Distinct (%)0.1%
Missing108
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean15.04043018
Minimum0
Maximum67
Zeros1606
Zeros (%)2.8%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum0
5-th percentile2
Q19
median13
Q320
95-th percentile30
Maximum67
Range67
Interquartile range (IQR)11

Descriptive statistics

Standard deviation8.542563252
Coefficient of variation (CV)0.5679733324
Kurtosis1.201822987
Mean15.04043018
Median Absolute Deviation (MAD)6
Skewness0.8179554806
Sum860087
Variance72.97538692
MonotonicityNot monotonic
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
135871
10.2%
95592
 
9.8%
114932
 
8.6%
174858
 
8.5%
154663
 
8.1%
74219
 
7.4%
193743
 
6.5%
203483
 
6.1%
63451
 
6.0%
242381
 
4.2%
Other values (27)13992
24.4%
ValueCountFrequency (%)
01606
 
2.8%
21270
 
2.2%
41802
 
3.1%
63451
6.0%
74219
7.4%
95592
9.8%
114932
8.6%
135871
10.2%
154663
8.1%
174858
8.5%
ValueCountFrequency (%)
672
 
< 0.1%
652
 
< 0.1%
635
 
< 0.1%
614
 
< 0.1%
594
 
< 0.1%
5713
 
< 0.1%
5616
 
< 0.1%
5418
< 0.1%
5229
0.1%
5042
0.1%

WindSpeed3pm
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct39
Distinct (%)0.1%
Missing466
Missing (%)0.8%
Infinite0
Infinite (%)0.0%
Mean19.34114065
Minimum0
Maximum76
Zeros190
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum0
5-th percentile7
Q113
median19
Q324
95-th percentile35
Maximum76
Range76
Interquartile range (IQR)11

Descriptive statistics

Standard deviation8.539047479
Coefficient of variation (CV)0.4414965814
Kurtosis0.5874802447
Mean19.34114065
Median Absolute Deviation (MAD)6
Skewness0.5954250177
Sum1099099
Variance72.91533184
MonotonicityNot monotonic
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
175220
 
9.1%
205010
 
8.7%
134984
 
8.7%
154585
 
8.0%
194453
 
7.8%
243988
 
7.0%
113769
 
6.6%
223592
 
6.3%
93548
 
6.2%
282911
 
5.1%
Other values (29)14767
25.8%
ValueCountFrequency (%)
0190
 
0.3%
2270
 
0.5%
4520
 
0.9%
61212
 
2.1%
72024
 
3.5%
93548
6.2%
113769
6.6%
134984
8.7%
154585
8.0%
175220
9.1%
ValueCountFrequency (%)
761
 
< 0.1%
691
 
< 0.1%
671
 
< 0.1%
656
 
< 0.1%
633
 
< 0.1%
614
 
< 0.1%
597
 
< 0.1%
577
 
< 0.1%
5613
< 0.1%
5425
< 0.1%

Humidity9am
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct100
Distinct (%)0.2%
Missing193
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean66.30966725
Minimum1
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum1
5-th percentile31
Q155
median67
Q380
95-th percentile95
Maximum100
Range99
Interquartile range (IQR)25

Descriptive statistics

Standard deviation18.68403027
Coefficient of variation (CV)0.2817693263
Kurtosis0.07275696992
Mean66.30966725
Median Absolute Deviation (MAD)12
Skewness-0.5008167492
Sum3786282
Variance349.092987
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
691289
 
2.2%
671282
 
2.2%
641278
 
2.2%
651269
 
2.2%
661257
 
2.2%
681241
 
2.2%
711237
 
2.2%
701221
 
2.1%
631210
 
2.1%
621201
 
2.1%
Other values (90)44615
77.9%
ValueCountFrequency (%)
13
 
< 0.1%
23
 
< 0.1%
38
 
< 0.1%
412
 
< 0.1%
514
 
< 0.1%
624
< 0.1%
727
< 0.1%
831
0.1%
936
0.1%
1042
0.1%
ValueCountFrequency (%)
100524
0.9%
99655
1.1%
98229
 
0.4%
97530
0.9%
96572
1.0%
95520
0.9%
94596
1.0%
93684
1.2%
92622
1.1%
91708
1.2%

Humidity3pm
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct101
Distinct (%)0.2%
Missing1088
Missing (%)1.9%
Infinite0
Infinite (%)0.0%
Mean49.69636153
Minimum0
Maximum100
Zeros3
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum0
5-th percentile15
Q135
median51
Q364
95-th percentile85
Maximum100
Range100
Interquartile range (IQR)29

Descriptive statistics

Standard deviation20.55299218
Coefficient of variation (CV)0.4135713671
Kurtosis-0.5220448319
Mean49.69636153
Median Absolute Deviation (MAD)14
Skewness0.005749433919
Sum2793184
Variance422.4254876
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
521159
 
2.0%
541149
 
2.0%
531123
 
2.0%
571111
 
1.9%
511103
 
1.9%
561099
 
1.9%
581089
 
1.9%
551081
 
1.9%
591059
 
1.8%
501054
 
1.8%
Other values (91)45178
78.9%
(Missing)1088
 
1.9%
ValueCountFrequency (%)
03
 
< 0.1%
115
 
< 0.1%
229
 
0.1%
340
 
0.1%
472
 
0.1%
598
0.2%
6152
0.3%
7160
0.3%
8209
0.4%
9217
0.4%
ValueCountFrequency (%)
10064
 
0.1%
9971
 
0.1%
9853
 
0.1%
9791
 
0.2%
96131
0.2%
95147
0.3%
94184
0.3%
93214
0.4%
92240
0.4%
91246
0.4%

Pressure9am
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct507
Distinct (%)0.9%
Missing108
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean1017.369544
Minimum980.5
Maximum1040.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum980.5
5-th percentile1006.4
Q11012.7
median1017.3
Q31022
95-th percentile1028.9
Maximum1040.9
Range60.4
Interquartile range (IQR)9.3

Descriptive statistics

Standard deviation6.92722356
Coefficient of variation (CV)0.006808955111
Kurtosis0.2495532787
Mean1017.369544
Median Absolute Deviation (MAD)4.6
Skewness-0.04339346896
Sum58178277.4
Variance47.98642625
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1016.4380
 
0.7%
1017.7369
 
0.6%
1015.5360
 
0.6%
1017.3354
 
0.6%
1017.2353
 
0.6%
1017.9349
 
0.6%
1018347
 
0.6%
1015.9344
 
0.6%
1016.3344
 
0.6%
1018.7343
 
0.6%
Other values (497)53642
93.6%
ValueCountFrequency (%)
980.51
< 0.1%
9821
< 0.1%
982.21
< 0.1%
982.91
< 0.1%
983.91
< 0.1%
984.62
< 0.1%
986.31
< 0.1%
986.61
< 0.1%
986.71
< 0.1%
9871
< 0.1%
ValueCountFrequency (%)
1040.91
 
< 0.1%
1040.61
 
< 0.1%
1040.31
 
< 0.1%
1040.21
 
< 0.1%
1040.11
 
< 0.1%
10401
 
< 0.1%
1039.61
 
< 0.1%
1039.51
 
< 0.1%
1039.32
< 0.1%
1039.23
< 0.1%

Pressure3pm
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct510
Distinct (%)0.9%
Missing114
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean1014.898126
Minimum977.1
Maximum1038.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum977.1
5-th percentile1004.1
Q11010.1
median1014.8
Q31019.6
95-th percentile1026.5
Maximum1038.5
Range61.4
Interquartile range (IQR)9.5

Descriptive statistics

Standard deviation6.912029717
Coefficient of variation (CV)0.006810565062
Kurtosis0.1360756717
Mean1014.898126
Median Absolute Deviation (MAD)4.7
Skewness0.01593496393
Sum58030859.94
Variance47.77615481
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1015.8350
 
0.6%
1015.3346
 
0.6%
1014.8344
 
0.6%
1015.5343
 
0.6%
1013.4342
 
0.6%
1011.9341
 
0.6%
1015.6338
 
0.6%
1011.7334
 
0.6%
1013.5334
 
0.6%
1011.8328
 
0.6%
Other values (500)53779
93.9%
ValueCountFrequency (%)
977.11
< 0.1%
978.21
< 0.1%
9791
< 0.1%
980.21
< 0.1%
981.41
< 0.1%
981.91
< 0.1%
982.21
< 0.1%
983.21
< 0.1%
983.31
< 0.1%
9841
< 0.1%
ValueCountFrequency (%)
1038.51
< 0.1%
1038.21
< 0.1%
1037.91
< 0.1%
1037.81
< 0.1%
1037.61
< 0.1%
1037.31
< 0.1%
1037.21
< 0.1%
1037.11
< 0.1%
10371
< 0.1%
1036.92
< 0.1%

Cloud9am
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.212189971
Minimum0
Maximum9
Zeros5963
Zeros (%)10.4%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median5
Q37
95-th percentile8
Maximum9
Range9
Interquartile range (IQR)6

Descriptive statistics

Standard deviation2.844811419
Coefficient of variation (CV)0.6753758588
Kurtosis-1.575193654
Mean4.212189971
Median Absolute Deviation (MAD)2
Skewness-0.1380027147
Sum241329
Variance8.092952009
MonotonicityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
713856
24.2%
110843
18.9%
86347
11.1%
05963
10.4%
65411
 
9.4%
24403
 
7.7%
34074
 
7.1%
53635
 
6.3%
42760
 
4.8%
91
 
< 0.1%
ValueCountFrequency (%)
05963
10.4%
110843
18.9%
24403
 
7.7%
34074
 
7.1%
42760
 
4.8%
53635
 
6.3%
65411
 
9.4%
713856
24.2%
86347
11.1%
91
 
< 0.1%
ValueCountFrequency (%)
91
 
< 0.1%
86347
11.1%
713856
24.2%
65411
 
9.4%
53635
 
6.3%
42760
 
4.8%
34074
 
7.1%
24403
 
7.7%
110843
18.9%
05963
10.4%

Cloud3pm
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

Distinct10
Distinct (%)< 0.1%
Missing2631
Missing (%)4.6%
Infinite0
Infinite (%)0.0%
Mean4.333412608
Minimum0
Maximum9
Zeros3573
Zeros (%)6.2%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median5
Q37
95-th percentile8
Maximum9
Range9
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.669629123
Coefficient of variation (CV)0.6160569888
Kurtosis-1.469746892
Mean4.333412608
Median Absolute Deviation (MAD)2
Skewness-0.1873986842
Sum236873
Variance7.126919654
MonotonicityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
712945
22.6%
19741
17.0%
65994
10.5%
85139
 
9.0%
24728
 
8.3%
34675
 
8.2%
54487
 
7.8%
03573
 
6.2%
43379
 
5.9%
91
 
< 0.1%
(Missing)2631
 
4.6%
ValueCountFrequency (%)
03573
 
6.2%
19741
17.0%
24728
 
8.3%
34675
 
8.2%
43379
 
5.9%
54487
 
7.8%
65994
10.5%
712945
22.6%
85139
 
9.0%
91
 
< 0.1%
ValueCountFrequency (%)
91
 
< 0.1%
85139
 
9.0%
712945
22.6%
65994
10.5%
54487
 
7.8%
43379
 
5.9%
34675
 
8.2%
24728
 
8.3%
19741
17.0%
03573
 
6.2%

Temp9am
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct381
Distinct (%)0.7%
Missing60
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean18.061171
Minimum-1.2
Maximum39
Zeros2
Zeros (%)< 0.1%
Negative10
Negative (%)< 0.1%
Memory size447.7 KiB

Quantile statistics

Minimum-1.2
5-th percentile8.1
Q113
median17.6
Q323
95-th percentile29.1
Maximum39
Range40.2
Interquartile range (IQR)10

Descriptive statistics

Standard deviation6.57070627
Coefficient of variation (CV)0.3638028934
Kurtosis-0.7225293096
Mean18.061171
Median Absolute Deviation (MAD)5
Skewness0.1301311901
Sum1033695
Variance43.17418088
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17362
 
0.6%
15344
 
0.6%
16.6341
 
0.6%
14.8336
 
0.6%
16.8332
 
0.6%
13.8328
 
0.6%
15.9327
 
0.6%
17.2326
 
0.6%
13.2323
 
0.6%
16322
 
0.6%
Other values (371)53892
94.1%
ValueCountFrequency (%)
-1.21
< 0.1%
-11
< 0.1%
-0.72
< 0.1%
-0.51
< 0.1%
-0.31
< 0.1%
-0.22
< 0.1%
-0.12
< 0.1%
02
< 0.1%
0.11
< 0.1%
0.21
< 0.1%
ValueCountFrequency (%)
391
 
< 0.1%
37.91
 
< 0.1%
37.73
< 0.1%
37.61
 
< 0.1%
37.31
 
< 0.1%
37.23
< 0.1%
36.91
 
< 0.1%
36.82
< 0.1%
36.61
 
< 0.1%
36.52
< 0.1%

Temp3pm
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct396
Distinct (%)0.7%
Missing926
Missing (%)1.6%
Infinite0
Infinite (%)0.0%
Mean22.57862757
Minimum3.7
Maximum46.1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size447.7 KiB

Quantile statistics

Minimum3.7
5-th percentile12.2
Q117.2
median22
Q327.7
95-th percentile34.1
Maximum46.1
Range42.4
Interquartile range (IQR)10.5

Descriptive statistics

Standard deviation6.872913944
Coefficient of variation (CV)0.3043991015
Kurtosis-0.5889203397
Mean22.57862757
Median Absolute Deviation (MAD)5.2
Skewness0.2640913415
Sum1272689.5
Variance47.23694608
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19349
 
0.6%
18.5342
 
0.6%
20331
 
0.6%
18.4331
 
0.6%
17.8329
 
0.6%
20.4324
 
0.6%
16.8321
 
0.6%
18.7320
 
0.6%
17319
 
0.6%
19.2317
 
0.6%
Other values (386)53084
92.7%
(Missing)926
 
1.6%
ValueCountFrequency (%)
3.71
< 0.1%
4.31
< 0.1%
4.61
< 0.1%
4.81
< 0.1%
5.11
< 0.1%
5.32
< 0.1%
5.51
< 0.1%
5.81
< 0.1%
62
< 0.1%
6.12
< 0.1%
ValueCountFrequency (%)
46.12
< 0.1%
45.82
< 0.1%
45.41
 
< 0.1%
45.31
 
< 0.1%
45.21
 
< 0.1%
451
 
< 0.1%
44.91
 
< 0.1%
44.83
< 0.1%
44.71
 
< 0.1%
44.52
< 0.1%

RainToday
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing58
Missing (%)0.1%
Memory size112.0 KiB
False
44678 
True
12557 
(Missing)
 
58
ValueCountFrequency (%)
False44678
78.0%
True12557
 
21.9%
(Missing)58
 
0.1%

RainTomorrow
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size56.1 KiB
False
44767 
True
12526 
ValueCountFrequency (%)
False44767
78.1%
True12526
 
21.9%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

DateLocationMinTempMaxTempRainfallEvaporationSunshineWindGustDirWindGustSpeedWindDir9amWindDir3pmWindSpeed9amWindSpeed3pmHumidity9amHumidity3pmPressure9amPressure3pmCloud9amCloud3pmTemp9amTemp3pmRainTodayRainTomorrow
02015-05-21Townsville21.129.20.04.89.5ENE30.0SEENE13.026.069.066.01018.81014.53.03.026.427.7NoNo
12014-03-26Woomera13.526.70.011.67.3ENE33.0SSEESE15.015.060.024.01018.71015.42.06.017.725.3NoNo
22017-01-19Mildura17.136.30.011.212.2N48.0ESEENE15.011.035.019.01011.61006.22.01.020.533.9NoYes
32009-03-25Portland9.321.04.43.06.5W31.0NNWW7.020.098.072.01015.91015.68.04.015.220.1YesNo
42010-08-08Brisbane6.622.20.04.010.1ENE19.0SWNE6.09.058.037.01022.41018.31.01.013.421.4NoNo
52009-01-03Watsonia5.221.40.06.210.5SSW28.0SESW6.017.059.044.01020.71017.41.07.012.919.9NoNo
62009-08-08Hobart6.815.20.60.62.7NNW44.0NNWNW19.019.060.057.01012.71010.97.07.011.413.6NoYes
72009-09-30Cairns21.230.90.07.210.2E35.0SSEE17.024.054.040.01016.21014.31.01.027.129.1NoNo
82008-10-22Brisbane16.525.86.67.49.4ENE37.0SENE7.017.058.056.01015.21009.87.03.021.523.4YesNo
92010-01-20Melbourne14.528.20.06.410.6SSE39.0NWSSE13.020.062.032.01011.41009.63.04.018.225.3NoNo

Last rows

DateLocationMinTempMaxTempRainfallEvaporationSunshineWindGustDirWindGustSpeedWindDir9amWindDir3pmWindSpeed9amWindSpeed3pmHumidity9amHumidity3pmPressure9amPressure3pmCloud9amCloud3pmTemp9amTemp3pmRainTodayRainTomorrow
572832016-09-07Perth12.317.56.23.07.9W57.0WSWWNW20.019.062.054.01012.21013.94.06.015.216.1YesYes
572842013-12-19Townsville24.432.30.014.89.7E59.0SEE24.039.052.048.01014.51012.44.03.029.030.5NoNo
572852013-10-13Williamtown15.835.90.025.6NaNNW81.0NNWNW15.048.042.010.01003.4998.24.01.026.435.0NoYes
572862011-12-27Woomera14.930.20.015.0NaNSE50.0SSES31.022.047.016.01015.51012.50.00.020.028.2NoNo
572872013-01-01MelbourneAirport12.128.90.07.013.0SSW50.0SSWSSE7.020.067.040.01013.81011.53.07.017.125.8NoNo
572882016-01-29MelbourneAirport13.019.114.67.81.0S28.0SSWSSE9.013.096.067.01003.41001.87.07.013.418.3YesYes
572892013-11-24NorfolkIsland17.622.81.43.81.2ESE31.0ESEESE13.015.092.082.01013.31010.98.07.019.422.5YesYes
572902014-08-10AliceSprings2.122.90.05.610.9ESE35.0NWSE4.020.027.020.01027.61023.70.00.013.421.6NoNo
572912015-01-16Sydney21.227.60.08.212.9ENE33.0WSWE6.022.067.063.01008.61005.83.02.024.726.6NoNo
572922012-08-26Watsonia7.914.80.83.24.4W39.0WSW15.015.065.056.01020.11020.26.07.010.113.1NoNo